1 Project 1: Creating a publication-grade plot

1.1 Task

Applying what you’ve learned, create an economics- or social-related plot that is polished with the appropriate annotations, aesthetics and some simple commentary.

1.2 Data Explanation

To fulfill required task, I use “Brazilian E-Commerce Public Dataset by Olist” from kaggle.com. This data is a public dataset provided by Olist Store, a Brazilian ecommerce. The dataset contains 100k orders information from 2016 to 2018 at multiple marketplaces in Brazil.

Picture below explain relationship of in the dataset image source: https://www.kaggle.com/olistbr/brazilian-ecommerce

1.4 Data Cleansing

##                             product_id              product_category_name
##  00066f42aeeb9f3007548bb9d3f33c38:    1   cama_mesa_banho      : 3029    
##  00088930e925c41fd95ebfe695fd2655:    1   esporte_lazer        : 2867    
##  0009406fd7479715e4bef61dd91f2462:    1   moveis_decoracao     : 2657    
##  000b8f95fcb9e0096488278317764d19:    1   beleza_saude         : 2444    
##  000d9be29b5207b54e86aa1b1ac54872:    1   utilidades_domesticas: 2335    
##  0011c512eb256aa0dbbb544d8dffcf6e:    1   automotivo           : 1900    
##  (Other)                         :32945   (Other)              :17719    
##  product_name_lenght product_description_lenght product_photos_qty
##  Min.   : 5.00       Min.   :   4.0             Min.   : 1.000    
##  1st Qu.:42.00       1st Qu.: 339.0             1st Qu.: 1.000    
##  Median :51.00       Median : 595.0             Median : 1.000    
##  Mean   :48.48       Mean   : 771.5             Mean   : 2.189    
##  3rd Qu.:57.00       3rd Qu.: 972.0             3rd Qu.: 3.000    
##  Max.   :76.00       Max.   :3992.0             Max.   :20.000    
##  NA's   :610         NA's   :610                NA's   :610       
##  product_weight_g product_length_cm product_height_cm product_width_cm
##  Min.   :    0    Min.   :  7.00    Min.   :  2.00    Min.   :  6.0   
##  1st Qu.:  300    1st Qu.: 18.00    1st Qu.:  8.00    1st Qu.: 15.0   
##  Median :  700    Median : 25.00    Median : 13.00    Median : 20.0   
##  Mean   : 2276    Mean   : 30.82    Mean   : 16.94    Mean   : 23.2   
##  3rd Qu.: 1900    3rd Qu.: 38.00    3rd Qu.: 21.00    3rd Qu.: 30.0   
##  Max.   :40425    Max.   :105.00    Max.   :105.00    Max.   :118.0   
##  NA's   :2        NA's   :2         NA's   :2         NA's   :2
## 'data.frame':    71 obs. of  2 variables:
##  $ product_category_name        : Factor w/ 71 levels "agro_industria_e_comercio",..: 12 45 9 14 55 33 62 71 69 65 ...
##  $ product_category_name_english: Factor w/ 71 levels "agro_industry_and_commerce",..: 44 16 6 8 40 66 60 50 69 71 ...
## 'data.frame':    3095 obs. of  4 variables:
##  $ seller_id             : Factor w/ 3095 levels "0015a82c2db000af6aaaf3ae2ecb0532",..: 623 2541 2506 2326 982 2343 2749 322 1458 2490 ...
##  $ seller_zip_code_prefix: int  13023 13844 20031 4195 12914 20920 55325 16304 1529 80310 ...
##  $ seller_city           : Factor w/ 611 levels "04482255","abadia de goias",..: 102 343 451 519 81 451 84 403 519 160 ...
##  $ seller_state          : Factor w/ 23 levels "AC","AM","BA",..: 23 23 17 23 23 17 14 23 23 16 ...
## 'data.frame':    103886 obs. of  5 variables:
##  $ order_id            : Factor w/ 99440 levels "00010242fe8c5a6d1ba2dd792cb16214",..: 71446 65633 14657 72396 25967 16046 46192 23843 12217 2146 ...
##  $ payment_sequential  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ payment_type        : Factor w/ 5 levels "boleto","credit_card",..: 2 2 2 2 2 2 2 2 2 1 ...
##  $ payment_installments: int  8 1 1 8 2 2 1 3 6 1 ...
##  $ payment_value       : num  99.3 24.4 65.7 107.8 128.4 ...
## 'data.frame':    112650 obs. of  7 variables:
##  $ order_id           : Factor w/ 98666 levels "00010242fe8c5a6d1ba2dd792cb16214",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ order_item_id      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ product_id         : Factor w/ 32951 levels "00066f42aeeb9f3007548bb9d3f33c38",..: 8629 29598 25668 15323 22080 30848 18182 11123 6385 9013 ...
##  $ seller_id          : Factor w/ 3095 levels "0015a82c2db000af6aaaf3ae2ecb0532",..: 855 2679 1118 1920 2698 1224 1372 1091 1997 2231 ...
##  $ shipping_limit_date: Factor w/ 93318 levels "2016-09-19 00:15:34",..: 24067 7194 45696 89769 1641 9384 39631 81823 60182 81339 ...
##  $ price              : num  58.9 239.9 199 13 199.9 ...
##  $ freight_value      : num  13.3 19.9 17.9 12.8 18.1 ...
## 'data.frame':    100000 obs. of  7 variables:
##  $ review_id              : Factor w/ 99173 levels "0001239bc1de2e33cb583967c2ca4c67",..: 48013 50030 13322 89282 95962 8159 3060 48292 63672 52194 ...
##  $ order_id               : Factor w/ 99441 levels "00010242fe8c5a6d1ba2dd792cb16214",..: 45052 63915 97051 39397 55031 68818 88966 75802 60275 72099 ...
##  $ review_score           : int  4 5 5 5 5 1 5 5 5 4 ...
##  $ review_comment_title   : Factor w/ 4601 levels "","-","- Luminária de Mesa pelic",..: 1 1 1 1 1 1 1 1 1 3805 ...
##  $ review_comment_message : Factor w/ 36923 levels "","'entrega feita dentro do prazo",..: 1 1 1 31778 25773 1 1 1 1 4357 ...
##  $ review_creation_date   : Factor w/ 637 levels "2016-10-02 00:00:00",..: 413 463 442 147 454 497 230 620 170 536 ...
##  $ review_answer_timestamp: Factor w/ 99010 levels "2016-10-07 18:32:28",..: 45452 56963 51957 5910 54897 66179 15081 93211 7935 75810 ...
##   review_id           order_id          review_score  
##  Length:100000      Length:100000      Min.   :1.000  
##  Class :character   Class :character   1st Qu.:4.000  
##  Mode  :character   Mode  :character   Median :5.000  
##                                        Mean   :4.071  
##                                        3rd Qu.:5.000  
##                                        Max.   :5.000  
##  review_comment_title review_comment_message review_creation_date         
##  Length:100000        Length:100000          Min.   :2016-10-02 00:00:00  
##  Class :character     Class :character       1st Qu.:2017-09-23 00:00:00  
##  Mode  :character     Mode  :character       Median :2018-02-02 00:00:00  
##                                              Mean   :2018-01-12 17:58:10  
##                                              3rd Qu.:2018-05-15 00:00:00  
##                                              Max.   :2018-08-31 00:00:00  
##  review_answer_timestamp      
##  Min.   :2016-10-07 18:32:28  
##  1st Qu.:2017-09-27 01:19:37  
##  Median :2018-02-04 19:31:06  
##  Mean   :2018-01-15 21:30:46  
##  3rd Qu.:2018-05-20 11:00:14  
##  Max.   :2018-10-29 12:27:35
## 'data.frame':    99441 obs. of  8 variables:
##  $ order_id                     : Factor w/ 99441 levels "00010242fe8c5a6d1ba2dd792cb16214",..: 88951 32546 27770 57386 67044 63543 7553 39236 46083 89697 ...
##  $ customer_id                  : Factor w/ 99441 levels "00012a2ce6f8dcda20d059ce98491703",..: 61761 68730 25514 96584 53774 31118 92191 60518 95387 19282 ...
##  $ order_status                 : Factor w/ 8 levels "approved","canceled",..: 4 4 4 4 4 4 5 4 4 4 ...
##  $ order_purchase_timestamp     : Factor w/ 98875 levels "2016-09-04 21:15:19",..: 27602 90668 94566 35023 55222 15891 6386 9743 667 18535 ...
##  $ order_approved_at            : Factor w/ 90734 levels "","2016-09-15 12:16:38",..: 26910 83755 86887 33880 52143 15447 6368 9419 707 18033 ...
##  $ order_delivered_carrier_date : Factor w/ 81019 levels "","2016-10-08 10:34:01",..: 24736 76469 78219 31487 49664 13978 1 9013 594 17711 ...
##  $ order_delivered_customer_date: Factor w/ 95665 levels "","2016-10-11 13:46:32",..: 25876 88743 91784 33916 50233 15592 1 9155 597 18403 ...
##  $ order_estimated_delivery_date: Factor w/ 459 levels "2016-09-30 00:00:00",..: 213 412 428 252 299 159 100 121 56 175 ...
##   review_id           order_id          review_score  
##  Length:100000      Length:100000      Min.   :1.000  
##  Class :character   Class :character   1st Qu.:4.000  
##  Mode  :character   Mode  :character   Median :5.000  
##                                        Mean   :4.071  
##                                        3rd Qu.:5.000  
##                                        Max.   :5.000  
##  review_comment_title review_comment_message review_creation_date         
##  Length:100000        Length:100000          Min.   :2016-10-02 00:00:00  
##  Class :character     Class :character       1st Qu.:2017-09-23 00:00:00  
##  Mode  :character     Mode  :character       Median :2018-02-02 00:00:00  
##                                              Mean   :2018-01-12 17:58:10  
##                                              3rd Qu.:2018-05-15 00:00:00  
##                                              Max.   :2018-08-31 00:00:00  
##  review_answer_timestamp      
##  Min.   :2016-10-07 18:32:28  
##  1st Qu.:2017-09-27 01:19:37  
##  Median :2018-02-04 19:31:06  
##  Mean   :2018-01-15 21:30:46  
##  3rd Qu.:2018-05-20 11:00:14  
##  Max.   :2018-10-29 12:27:35

1.6 Relationship between photo qty and order frequency within each product category

Interpretasi: Berdasarkan visualisasi sebaran data di atas, dapat ditarik kesimpulan bahwa tidak ada korelasi antara photo quantity dan jumlah order.

Interpretasi: Dari plot di atas, diidentifikasi bahwa terdapat korelasi negatif antara estimated delivery dan actual delivery.

Interpretasi: Berdasarkan visualisasi di atas, nilai review 5 didapatkan saat durasi estimated time tinggi, tetapi durasi actual time rendah.

2 Project 2: Creating a publication-grade plot